Search CORE

39 research outputs found

CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules

Author: Bertolazzi Paola
Cestarelli Valerio
FELICI GIOVANNI
FISCON GIULIA
Weitschek Emanuel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

Learning to classify species with barcodes

Author: Bertolazzi Paola
Felici Giovanni
Weitschek Emanuel
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Classification of selectively constrained DNA elements using feature vectors and rule-based classifiers

Author: Dimitris Polychronopoulos
Emanuel Weitschek
Emanuel Weitschek
Giovanni Felici
Philipp Bucher
Philipp Bucher
Slavica Dimitrieva
Slavica Dimitrieva
Yannis Almirantis
Publication venue
Publication date: 01/01/2014
Field of study

Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented. (C) 2014 Elsevier Inc. All rights reserved

Open Access Repository

LAF : Logic Alignment Free and its application to bacterial genomes classification

Author: Cunial Fabio
Felici Giovanni
Weitschek Emanuel
Publication venue
Publication date: 01/01/2015
Field of study

Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Alignment Free (LAF), a method that combines alignment-free techniques and rule-based classification algorithms in order to assign biological samples to their taxa. This method searches for a minimal subset of k-mers whose relative frequencies are used to build classification models as disjunctive-normal-form logic formulas (if-then rules). We apply LAF successfully to the classification of bacterial genomes to their corresponding taxonomy. In particular, we succeed in obtaining reliable classification at different taxonomic levels by extracting a handful of rules, each one based on the frequency of just few k-mers. State of the art methods to adjust the frequency of k-mers to the character distribution of the underlying genomes have negligible impact on classification performance, suggesting that the signal of each class is strong and that LAF is effective in identifying it.Peer reviewe

Springer - Publisher Connector

PubMed Central

Helsingin yliopiston digitaalinen arkisto

MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

Author: Babakir Mina Muhammed
Bertolazzi Paola
Cella Eleonora
Ciccozzi Massimo
Ciotti Marco
Felici Giovanni
Fiscon Giulia
Giovanetti Marta
Lo Presti Alessandra
Pierangeli Alessandra
Weitschek Emanuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

FigShare

Combining EEG signal processing with supervised methods for Alzheimer’s patients classification

Author: Bertolazzi Paola
Bramanti Alessia
Bramanti Placido
Cialini Alessio
De Cola Maria Cristina
De Salvo Simona
Felici Giovanni
Fiscon Giulia
Weitschek Emanuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background Alzheimer’s Disease (AD) is a neurodegenaritive disorder characterized by a progressive dementia, for which actually no cure is known. An early detection of patients affected by AD can be obtained by analyzing their electroencephalography (EEG) signals, which show a reduction of the complexity, a perturbation of the synchrony, and a slowing down of the rhythms. Methods In this work, we apply a procedure that exploits feature extraction and classification techniques to EEG signals, whose aim is to distinguish patient affected by AD from the ones affected by Mild Cognitive Impairment (MCI) and healthy control (HC) samples. Specifically, we perform a time-frequency analysis by applying both the Fourier and Wavelet Transforms on 109 samples belonging to AD, MCI, and HC classes. The classification procedure is designed with the following steps: (i) preprocessing of EEG signals; (ii) feature extraction by means of the Discrete Fourier and Wavelet Transforms; and (iii) classification with tree-based supervised methods. Results By applying our procedure, we are able to extract reliable human-interpretable classification models that allow to automatically assign the patients into their belonging class. In particular, by exploiting a Wavelet feature extraction we achieve 83%, 92%, and 79% of accuracy when dealing with HC vs AD, HC vs MCI, and MCI vs AD classification problems, respectively. Conclusions Finally, by comparing the classification performances with both feature extraction methods, we find out that Wavelets analysis outperforms Fourier. Hence, we suggest it in combination with supervised methods for automatic patients classification based on their EEG signals for aiding the medical diagnosis of dementia

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza

Human polyomaviruses identification by logic mining techniques

Author: Alessandra Lo Presti
AM Gaynor
BL Padgett
Emanuel Weitschek
G Felici
G Felici
G Felici
Giovanni Felici
Guido Drovandi
H Feng
JD Thompson
Marco Ciotti
Massimo Ciccozzi
P Bertolazzi
P Bertolazzi
Paola Bertolazzi
R Johne
S Tremolada
SD Gardner
T Allander
TA Hall
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Supervised DNA Barcodes species classification: analysis, comparisons and results

Author: B DasGupta
C Bonferroni
C Liu
CBOL Plant Working Group
CL Schoch
CM Bishop
CP Meyer
D Schindel
DP Little
DP Little
E Weitschek
Emanuel Weitschek
F Austerlitz
F Wilcoxon
G Felici
GH John
Giovanni Felici
Giulia Fiscon
IN Sarkar
JC Platt
JS Farris
K Munch
KG Dexter
M Albu
M Hall
M Lou
N Saitou
P Bertolazzi
P Kuksa
PDN Hebert
PDN Hebert
PDN Hebert
PDN Hebert
R Meier
R Quinlan
R Van Velzen
S Ratnasingham
SF Altschul
T Lehr
WW Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

DNA Barcoding of Recently Diverged Species: Relative Performance of Matching Methods

Author: A Bastos
A Edwards
A Skoracka
A Yassin
A Yassin
AJ Fazekas
AM Griffiths
AR Lemmon
B DasGupta
BN Reid
C Bonferroni
C Moritz
C Paredes-Esquivel
CH Hsieh
CP Meyer
CS McBride
CS McFadden
D Bickford
D Castillo
D Posada
DE Schindel
DJ Funk
DP Little
DP Little
DV Nolan
E Paradis
E Weitschek
Emanuel Weitschek
F Austerlitz
F Wilcoxon
Freek T. Bakker
G Bucciarelli
G Felici
GE May
Giovanni Felici
HA Ross
IN Sarkar
IN Sarkar
IN Sarkar
IN Sarkar
Indra Neil Sarkar
J Azpurua
J Neigel
J Rach
JF Wallman
JFC Kingman
JP Huelsenbeck
K Munch
KF Armstrong
KF Armstrong
KG Dexter
L Kaila
LB Koski
LM Boykin
M Elias
M Elias
M Friedman
M Hasegawa
M Kimura
M Lou
M Steel
M Virgilio
M Wiemers
MM Aveskamp
MV Matz
N Hubert
N Petit
N Saitou
P Bertolazzi
P Bertolazzi
P Bertolazzi
P Kuksa
PA Goloboff
PA Goloboff
PDN Hebert
PDN Hebert
PDN Hebert
PDN Hebert
PM Hollingsworth
PM Hollingsworth
PZ Goldstein
R DeSalle
R Floyd
R Hudson
R Lahaye
R Meier
R Meier
R Nichols
R Nielsen
R Van Velzen
RD Ward
RD Ward
RJ Petit
Robin van Velzen
S Damm
S McKeon
S Ratnasingham
S Zou
SF Altschul
SG Newmaster
T-K Seo
TS Zemlak
V Dinca
WP Maddison
WP Maddison
Z Abdo
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be problematic in some cases. Problems are mostly due to incomplete lineage sorting or simply lack of a ‘barcode gap’ and probably related to large effective population size and/or low mutation rate. Our objective was to compare six methods in their ability to correctly identify recently diverged species with DNA barcodes: neighbor joining and parsimony (both tree-based), nearest neighbor and BLAST (similarity-based), and the diagnostic methods DNA-BAR, and BLOG. We analyzed simulated data assuming three different effective population sizes as well as three selected empirical data sets from published studies. Results show, as expected, that success rates are significantly lower for recently diverged species (∼75%) than for older species (∼97%) (P<0.00001). Similarity-based and diagnostic methods significantly outperform tree-based methods, when applied to simulated DNA barcode data (P<0.00001). The diagnostic method BLOG had highest correct query identification rate based on simulated (86.2%) as well as empirical data (93.1%), indicating that it is a consistently better method overall. Another advantage of BLOG is that it offers species-level information that can be used outside the realm of DNA barcoding, for instance in species description or molecular detection assays. Even though we can confirm that identification success based on DNA barcoding is generally high in our data, recently diverged species remain difficult to identify. Nevertheless, our results contribute to improved solutions for their accurate identification

Crossref

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

FigShare